1 Data sources

1. Where the Data was found

The data was sourced from the Victoria Health Coronavirus website, which stores the data of all the coronavirus cases throughout Victoria since 2020. They have data sets available for outsiders to download and analyze, as well as post updated information frequently.

Of the data sets available for download the data chosen for this analysis was Victorian case data that contains ‘all data of PCR and RAT cases by Local Government Area and postcode’. This is primarily because this data set contains more variables thereby giving more flexibility and control for the analysis and results. It will help bolster the objective of this analysis and help improve understanding of the results provided.

This data is protected by copyright under a Creative Commons Attribution Version 4.0, which is an international license. Therefore, when using this data we must be mindful of the restrictions laid out in the aforementioned license.

2. Observational or Experimental Data

This is a observational data as the Victoria Health department specified that they gathered their data through contact tracing and management of the COVID-19 outbreak. This is indicative that researcher or health professionals do not control the factors affecting the variables, meaning that the variables themselves are independent, but rather they attempt to determine correlation of factors. In this scenario the independent variables are the coronavirus cases or people affected by the coronavirus.

4. Unit and potential Unique Identifier

In this data set the “postcode” can be considered a unique identifier as it can easily identify an individual especially when combined with the diagnosis date and case count.

5. Storing and Analysing the Data

The data set in its current form can be stored and used for analysis of the research question, however the only variables essential to this analysis is the date of diagnosis and the number of cases on those days. Therefore, to keep the report concise and accurate we will remove all columns except “diagnosis_date” as well as create a new column “total_cases” to give the total number of cases on a given day within the period of interest.

2 🔍 Analysis

6. Daily Cases over Time of Interest

Time series of Covid cases from 30th June to the 13th of September 2020

Figure 2.1: Time series of Covid cases from 30th June to the 13th of September 2020

From this figure we see that the overall trend for the number of cases diagnosed with covid-19 is increasing from the 1st of June to the 13th of September. The 2 point highlighted in red represent the intervention dates in question, the “30th of June 2020” and the “23rd of July” respectively.

7. The 7-Day Growth Rate

Growth Rate of Covid cases within the lockdown period of 30th June to 29th July 2020

Figure 2.2: Growth Rate of Covid cases within the lockdown period of 30th June to 29th July 2020

Since we are concerned with effects of local lockdowns, the above growth rate considers the lockdown duration within the investigation period. According to Davey (2020), the lockdown at that point in time was imposed from the 30th of June to the 29th of July 2020. The growth rate is calculated on a weekly basis, meaning that it is calculated using the number of cases 7 days prior to each given date. The two dates of intervention are highlighted in red.

The first thing to note from 2.2 is that the growth rate on the 30th of June displays the growth rate of covid cases for the week prior to the lockdown being imposed. This point can considered a reference point to understand the effectiveness of the lockdown.

We can understand that despite there being an initial spike in the growth rate of cases, it subsided towards the end of the lockdown. It can be seen that the growth rate on the second date of intervention, “23rd July”, was lower compared to the previously calculated growth rate. Although not the lowest, the growth rate on the 23rd of July was one of the lowest calculated upto that point and is considerably lower than other dates throughout the lockdown period. This means the lockdown was relatively effective in this period.

8. Relationship between Growth Rate and Effective Transmission Rate

9. Estimating R(0) and Change in R(0) due to intervention

Let us consider the SEIR model for coronavirus specifically. The abbreviation SEIR stands for:

S - People that are Susceptible

E - People that are Exposed but not infectious

I - People that are Infected and infectious

R - People that have Recovered

This model is used to calculate the number of people affected by covid in a given region or area. It helps determine and understand how contagious the virus is or can be overtime as well as the recovery rate and infectious periods, which then assist officials in making informed decisions to combat the virus, in this context covid-19.

It is understood that S+E+I+R=1. This means that the summation of the number of people in these 4 categories make up the entire population of the specific region the SEIR model is being carried out in. It is make clear that all members of the population are considered or accounted for in the computation of this model. Additionally, it ensures that no individual is duplicated. For example, once a person recovers from the coronovirus, they are no longer accounted for in the I-Infected and infectious category or any of the other categories.

Since we are interested in the R(0) of the intervention dates, the SEIR model will be calculated using the growth rate on those 2 particular dates. That is the growth rate on the 30th of June and on the 23rd of July 2020. From then on we can compute the effective transimission number and other required values.

3 📉 Data curation

The criteria I am following in this section is the one laid out by Broman and Woo (2018).

  1. The first thing is to be consistent with the organization of your data. In this Covid-19 data set the variables consistently used were “diagnosed_date” and “Total_cases”.

  2. It is important to choose good and meaningful names for variable and file names. In this data set “diagnosed_data” represents the date a patient was diagnosed to be positive with Covid-19, while “Total_cases” represents the total number of cases diagnosed on each date.

  3. When writing dates it is

Resources

Broman, Karl W., and Kara H. Woo. 2018. “Data Organization in Spreadsheets.” The American Statistician 72 (1): 2–10. https://doi.org/10.1080/00031305.2017.1375989.
Davey, Melissa. 2020. “Melbourne Suburbs Lockdown Announced as Victoria Battles Coronavirus Outbreaks.” The Guardian. https://www.theguardian.com/australia-news/2020/jun/30/melbourne-hotspot-lockdowns-announced-as-victoria-battles-coronavirus-outbreaks.
Spinu, Vitalie, Garrett Grolemund, and Hadley Wickham. 2023. Lubridate: Make Dealing with Dates a Little Easier. https://CRAN.R-project.org/package=lubridate.
Wickham, Hadley. 2023. Tidyverse: Easily Install and Load the Tidyverse. https://CRAN.R-project.org/package=tidyverse.
Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, Hiroaki Yutani, and Dewey Dunnington. 2023. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://CRAN.R-project.org/package=ggplot2.
Xie, Yihui. 2023. Bookdown: Authoring Books and Technical Documents with r Markdown. https://CRAN.R-project.org/package=bookdown.